Diffusion Model

  • class-conditioned image generation: [1]

  • Image-to-image translation: [4], [7], [6], [8]

  • Image-to-image translation with guidance: GLIDE[2](global: text), [20](global: text, sketch), [21](local: text), [22](local: text, image), ControlNet[23](global: mixture), T2I-Adapter[24](global: mixture), [25](local: mixture), [26](global: text), [27]

  • unpaired Image-to-image translation: [19] [28] [29]

  • Image composition: SDEdit [17], ILVR [6], [5], [9], [15]

  • Image inpainting: [10], [11], [12], [13]

  • Predict mask: [31] cross-attention and post processing; [32] add one output channel; [33] predict masks using the feature maps in early steps.

MileStone: DDPM [3], Stable diffusion v1, v2, XL, v3

Acceleration: DDIM [14], PLMS [16]

High-resolution: [34] progressive training

Light-weight: [35]

Failure case analyses: [30]

Surveys

Tutorial materials: [a] [b]

References

[1] Dhariwal, Prafulla, and Alex Nichol. “Diffusion models beat gans on image synthesis.” arXiv preprint arXiv:2105.05233 (2021).

[2] Nichol, Alex, et al. “Glide: Towards photorealistic image generation and editing with text-guided diffusion models.” arXiv preprint arXiv:2112.10741 (2021).

[3] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models.” Advances in Neural Information Processing Systems 33 (2020): 6840-6851.

[4] Wang, Tengfei, et al. “Pretraining is All You Need for Image-to-Image Translation.” arXiv preprint arXiv:2205.12952 (2022).

[5] Hachnochi, Roy, et al. “Cross-domain Compositing with Pretrained Diffusion Models.” arXiv preprint arXiv:2302.10167 (2023).

[6] Choi, Jooyoung, et al. “ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models.” ICCV, 2021.

[7] Kwon, Gihyun, and Jong Chul Ye. “Diffusion-based image translation using disentangled style and content representation.” ICLR, 2023.

[8] Meng, Chenlin, et al. “Sdedit: Guided image synthesis and editing with stochastic differential equations.” ICLR, 2021.

[9] Yang, Binxin, et al. “Paint by Example: Exemplar-based Image Editing with Diffusion Models.” arXiv preprint arXiv:2211.13227 (2022).

[10] Lugmayr, Andreas, et al. “Repaint: Inpainting using denoising diffusion probabilistic models.” CVPR, 2022.

[11] Rombach, Robin, et al. “High-resolution image synthesis with latent diffusion models.” CVPR, 2022.

[12] Li, Wenbo, et al. “SDM: Spatial Diffusion Model for Large Hole Image Inpainting.” arXiv preprint arXiv:2212.02963 (2022).

[13] Wang, Su, et al. “Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting.” arXiv preprint arXiv:2212.06909 (2022).

[14] Song, Jiaming, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models.” arXiv preprint arXiv:2010.02502 (2020).

[15] Song, Yizhi, et al. “ObjectStitch: Generative Object Compositing.” CVPR, 2023.

[16] Liu, Luping, et al. “Pseudo numerical methods for diffusion models on manifolds.” ICLR, (2022).

[17] Meng, Chenlin, et al. “Sdedit: Guided image synthesis and editing with stochastic differential equations.” ICLR, 2021.

[19] Kwon, Gihyun, and Jong Chul Ye. “Diffusion-based image translation using disentangled style and content representation.” ILCR, 2023.

[20] Voynov, Andrey, Kfir Aberman, and Daniel Cohen-Or. “Sketch-Guided Text-to-Image Diffusion Models.” arXiv preprint arXiv:2211.13752 (2022).

[21] Yang, Zhengyuan, et al. “ReCo: Region-Controlled Text-to-Image Generation.” arXiv preprint arXiv:2211.15518 (2022).

[22] Li, Yuheng, et al. “GLIGEN: Open-Set Grounded Text-to-Image Generation.” arXiv preprint arXiv:2301.07093 (2023).

[23] Zhang, Lvmin, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models.” arXiv preprint arXiv:2302.05543 (2023).

[24] Mou, Chong, et al. “T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models.” arXiv preprint arXiv:2302.08453 (2023).

[25] Huang, Lianghua, et al. “Composer: Creative and controllable image synthesis with composable conditions.” arXiv preprint arXiv:2302.09778 (2023).

[26] Wei, Yuxiang, et al. “Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation.” arXiv preprint arXiv:2302.13848 (2023).

[27] Zhao, Shihao, et al. “Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models.” arXiv preprint arXiv:2305.16322 (2023).

[28] Sasaki, Hiroshi, Chris G. Willcocks, and Toby P. Breckon. “Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models.” arXiv preprint arXiv:2104.05358 (2021).

[29] Su, Xuan, et al. “Dual diffusion implicit bridges for image-to-image translation.” arXiv preprint arXiv:2203.08382 (2022).

[30] Chengbin Du, Yanxi Li, Zhongwei Qiu, Chang Xu, “Stable Diffusion is Unstable”.

[31] Wu, Weijia, et al. “Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models.” arXiv preprint arXiv:2303.11681 (2023).

[32] Xie, Shaoan, et al. “Smartbrush: Text and shape guided object inpainting with diffusion model.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

[33] Ma, Jian, et al. “GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently.” arXiv preprint arXiv:2303.17870 (2023).

[34] Gu, Jiatao, et al. “Matryoshka Diffusion Models.” arXiv preprint arXiv:2310.15111 (2023).

[35] Li, Yanyu, et al. “SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds.” NeurIPS(2023).